Cognateness, frequency, and vocabulary size

An interactive account of bilingual lexical acquisition

Gonzalo
Garcia-Castro
Daniela S.
Ávila-Varela
Ignacio
Castillejo
Núria
Sebastian-Galles

Word acquisition



Medical Xpress

Word acquisition


Bilingual word acquisition: more than one word-form per referent

gos → DOG ← perro

Word acquisition



Vocabulary checklist: number/proportion of words checked by caregivers as Understands, and/or Says (e.g., CDI, Fenson et al. 1994)

Understands Understands & Says
chair [ x ] [   ]
table [   ] [   ]
[   ] [ x ]


Word acquisition

English-Spanish bilinguals: smaller English vocab. size compared to monolinguals, but similar total vocab. size (Hoff et al. 2012)

Mixed evidence on other language pairs: English-French, Catalan-Spanish, English-Dutch

Linguistic distance


Bilingual toddlers learning two typologically close languages: larger vocabulary sizes (Floccia et al. 2018)

Cognate: form-similar translation equivalents (TEs)

Cognate Non-cognate
[cat] /ˈgat-ˈgato/ [dog] /ˈgos-ˈpe.ro/

Bilinguals acquire TEs from early steps of vocabulary growth (Bilson et al. 2015; Tsui et al. 2022)

Cognates are acquired earlier than non-cognates (Mitchell, Tsui, and Byers-Heinlein 2022; Bosch and Ramon-Casas 2014)

Why would cognates be acquired earlier?

Parallel activation: candidate mechanism?

Lexical access is language non-selective:

Translation equivalents are co-activated, even in monolingual situations (e.g., Costa, Caramazza, and Sebastian-Galles 2000)

Dissociation between models of bilingual word processing and word acquisition

Accumulator models


Word acquisition as a continuous process of lexical consolidation (Hidaka 2013; Mollica and Piantadosi 2017)

Simulating word acquisition

Monolingual word acquisition

For participant i and word j:

\begin{aligned} \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \\ \text{Frequency}_j &\sim \text{Poisson}(\lambda) \\\\ \textbf{For simulations:}~ \lambda &= 50 \end{aligned}


Monolingual word acquisition

For participant i and word j:

\begin{aligned} \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \\ \text{Frequency}_j &\sim \text{Poisson}(\lambda) \\ \\ \textbf{For simulations:}\\ \lambda &= 50 \\ \text{Threshold} &= 300 \end{aligned}


Monolingual word acquisition

For participant i and word j:

\begin{aligned} \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \\ \text{Frequency}_j &\sim \text{Poisson}(\lambda) \\ \\ \textbf{For simulations:}\\ \lambda &= 50 \\ \text{Threshold} &= 300 \\ \\ \text{Age of Acquisition}_{ij} &= \text{Age}_{i~[\text{Threshold]}} \end{aligned}


Bilingual word acquisition

No parallel activation

Catalan Spanish
Language exposure 60% 40%

Bilingual word acquisition

No parallel activation

Catalan Spanish
Language exposure 60% 40%

Bilingual word acquisition

Parallel activation

Hypothesis: word-representations receive learning instances from their translations

Increment in learning instances: proportional to form-similarity (cognateness)

Cognate

Non-cognate

Bilingual word acquisition

Parallel activation

Including learning instances from parallel activation

Hypothesis: word-representations receive learning instances from their translations

Proportional to the amount of form-similarity (cognateness)

\begin{aligned} \textbf{Monolinguals:} \\ \text{Learning instances}_{ij} &= Age_i \cdot Frequency_j \end{aligned}

\begin{aligned} \textbf{Bilinguals:} \\ \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Frequency}_j \cdot \text{Exposure}_i+ \\ &(\text{Cognateness}_j \cdot \text{Learning instances}_{ij'}) \end{aligned}

Bilingual word acquisition

Parallel activation

Catalan Spanish
Language exposure 60% 40%

Testing predictions (observed data)

Questionnaire

Barcelona Vocabulary Questionnaire (BVQ)

Participants filled one of four versions of the questionnaire: - 500 items: 250 Catalan + 250 Spanish

Short-listed (nouns): 302 translation equivalents (TE)

Participants

138,078 item responses from 366 participants

1 time 2 times 3 times 4 times
312 42 8 4


Data analysis

Model structure


Ordinal regression model: P(Understands), P(Says)

  • No < Understands < Understands and Says

Multilevel: Crossed-random effects

  • Participant and Translation equivalent as grouping variables

Bayesian: probability of parameter values

P(\text{model} | \text{data}) \propto P(\text{data} | \text{model}) \times P(\text{model})

Predictors

Predictor Example
Age Months
Length Number of phonemes
Exposure Lexical frequency \times Language exposure
Cognateness Levenshtein similarity between a word-form and its translation
Two-way and three-way interactions between age, exposure, and cognateness

Results

Posterior distribution

Predictor Estimate 95% HDI p(H0)
Intercepts
Comprehension and Production 0.438 [-0.5, 0.5] 0.088
Comprehension 0.936 [2.44, 0.95] 0.000
Slopes
Age (+1 SD, 4.87, months) 0.405 [1.43, 0.45] 0.000
Exposure (+1 SD, 1.81) 0.233 [0.8, 0.27] 0.000
Cognateness (+1 SD, 0.26) 0.058 [0.06, 0.1] 0.037
Length (+1 SD, 1.56 phonemes) -0.062 [-0.35, -0.04] 0.000
Age × Exposure 0.071 [0.16, 0.1] 0.000
Age × Cognateness 0.014 [0, 0.03] 0.985
Exposure × Cognateness -0.057 [-0.28, -0.05] 0.000
Age × Exposure × Cognateness -0.018 [-0.11, -0.01] 0.975

Posterior predictions

Posterior predictions

Posterior predictions

Discussion

  • Cognateness facilitates word acquisition
  • Only low-exposure words benefit from their cognate status: less dominant language receives more facilitation
  • Parallel activation as mechanism that boosts lexical consolidation: increment in cumulative learning instances
  • Catalan-Spanish: very specific population
  • Next steps: word-learning, formalisation

Appendix

Item properties

Levenshtein similarity

Phonological similarity

Levenshtein distance: number of edits for two character strings to become identical

Orthography Phonology String
Catalan porta /ˈpɔɾ.tə/ pɔɾtə
Spanish puerta /ˈpweɾ.ta/ pweɾta

Levenshtein similarity

1-\frac{lev(A, B)}{Max(length(A), length(B))}

Catalan Spanish Levenshtein
porta (/ˈpɔɾ.tə/) puerta (/ˈpweɾ.ta/) 0.50 (3)
taula (/ˈtaw.lə/) mesa* (/ˈmesa/) 0.00 (5)
cotxe (/ˈkɔ.t͡ʃə/) coche (/ˈkot͡ʃe/) 0.40 (3)

Thank you

References

Arslan, Ruben C., Matthias P. Walther, and Cyril S. Tata. 2020. “Formr: A Study Framework Allowing for Automated Feedback Generation and Complex Longitudinal Experience-Sampling Studies Using R.” Behavior Research Methods 52 (1): 376–87. https://doi.org/10.3758/s13428-019-01236-y.
Bergelson, Elika, and Daniel Swingley. 2012. “At 6–9 Months, Human Infants Know the Meanings of Many Common Nouns.” Proceedings of the National Academy of Sciences 109 (9): 3253–58. https://doi.org/10.1073/pnas.1113380109.
Bilson, Samuel, Hanako Yoshida, Crystal D Tran, Elizabeth A Woods, and Thomas T Hills. 2015. “Semantic Facilitation in Bilingual First Language Acquisition.” Cognition 140: 122–34.
Bosch, Laura, and Marta Ramon-Casas. 2014. “First Translation Equivalents in Bilingual Toddlers’ Expressive Vocabulary: Does Form Similarity Matter?” International Journal of Behavioral Development 38 (4): 317–22. https://doi.org/10.1177/0165025414532559.
Costa, Albert, Alfonso Caramazza, and Nuria Sebastian-Galles. 2000. “The Cognate Facilitation Effect: Implications for Models of Lexical Access.” Journal of Experimental Psychology: Learning, Memory, and Cognition 26: 1283–96. https://doi.org/10.1037/0278-7393.26.5.1283.
Fenson, Larry, Philip S. Dale, J. Steven Reznick, Elizabeth Bates, Donna J. Thal, Stephen J. Pethick, Michael Tomasello, Carolyn B. Mervis, and Joan Stiles. 1994. “Variability in Early Communicative Development.” Monographs of the Society for Research in Child Development 59 (5): i–185. https://doi.org/10.2307/1166093.
Floccia, Caroline, Thomas D. Sambrook, Claire Delle Luche, Rosa Kwok, Jeremy Goslin, Laurence White, Allegra Cattani, et al. 2018. “I: Introduction.” Monographs of the Society for Research in Child Development 83 (1): 7–29. https://doi.org/10.1111/mono.12348.
Hidaka, Shohei. 2013. “A Computational Model Associating Learning Process, Word Attributes, and Age of Acquisition.” PLOS ONE 8 (11): e76242. https://doi.org/10.1371/journal.pone.0076242.
Hoff, Erika, Cynthia Core, Silvia Place, Rosario Rumiche, Melissa Señor, and Marisol Parra. 2012. “Dual Language Exposure and Early Bilingual Development*.” Journal of Child Language 39 (1): 1–27. https://doi.org/10.1017/S0305000910000759.
Jusczyk, P. W., and R. N. Aslin. 1995. “Infants′ Detection of the Sound Patterns of Words in Fluent Speech.” Cognitive Psychology 29 (1): 1–23. https://doi.org/10.1006/cogp.1995.1010.
Mitchell, Lori, Rachel K. Y. Tsui, and Krista Byers-Heinlein. 2022. “Cognates Are Advantaged in Early Bilingual Expressive Vocabulary Development.” PsyArXiv. https://doi.org/10.31234/osf.io/daktp.
Mollica, Francis, and Steven T. Piantadosi. 2017. “How Data Drive Early Word Learning: A Cross-Linguistic Waiting Time Analysis.” Open Mind 1 (2): 67–77. https://doi.org/10.1162/OPMI_a_00006.
Tsui, Rachel Ka-Ying, Ana Maria Gonzalez-Barrero, Esther Schott, and Krista Byers-Heinlein. 2022. “Are Translation Equivalents Special? Evidence from Simulations and Empirical Data from Bilingual Infants.” Cognition 225 (August): 105084. https://doi.org/10.1016/j.cognition.2022.105084.